Goto

Collaborating Authors

 divide-and-conquer approach


A Divide-and-Conquer Approach for Modeling Arrival Times in Business Process Simulation

arXiv.org Artificial Intelligence

Business Process Simulation (BPS) is a critical tool for analyzing and improving organizational processes by estimating the impact of process changes. A key component of BPS is the case-arrival model, which determines the pattern of new case entries into a process. Although accurate case-arrival modeling is essential for reliable simulations--as it influences waiting and overall cycle times--existing approaches often rely on oversimplified static distributions of inter-arrival times. These approaches fail to capture the dynamic and temporal complexities inherent in organizational environments, leading to less accurate and reliable outcomes. To address this limitation, we propose Auto Time Kernel Density Estimation (AT-KDE), a divide-and-conquer approach that models arrival times of processes by incorporating global dynamics, day-of-week variations, and intraday distributional changes, ensuring both precision and scalability. Experiments conducted across 20 diverse processes demonstrate that AT-KDE is far more accurate and robust than existing approaches while maintaining sensible execution time efficiency.


Can Large Language Models do Analytical Reasoning?

arXiv.org Artificial Intelligence

This paper explores the cutting-edge Large Language Model with analytical reasoning on sports. Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games. Our major discoveries are in two folds. Firstly, we find among all the models we employed, GPT-4 stands out in effectiveness, followed by Claude-2.1, with GPT-3.5, Gemini-Pro, and Llama-2-70b lagging behind. Specifically, we compare three different prompting techniques and a divide-and-conquer approach, we find that the latter was the most effective. Our divide-and-conquer approach breaks down play-by-play data into smaller, more manageable segments, solves each piece individually, and then aggregates them together. Besides the divide-and-conquer approach, we also explore the Chain of Thought (CoT) strategy, which markedly improves outcomes for certain models, notably GPT-4 and Claude-2.1, with their accuracy rates increasing significantly. However, the CoT strategy has negligible or even detrimental effects on the performance of other models like GPT-3.5 and Gemini-Pro. Secondly, to our surprise, we observe that most models, including GPT-4, struggle to accurately count the total scores for NBA quarters despite showing strong performance in counting NFL quarter scores. This leads us to further investigate the factors that impact the complexity of analytical reasoning tasks with extensive experiments, through which we conclude that task complexity depends on the length of context, the information density, and the presence of related information. Our research provides valuable insights into the complexity of analytical reasoning tasks and potential directions for developing future large language models.


Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

arXiv.org Machine Learning

The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We develop a novel response-adaptive partition strategy to overcome the limitation. In particular, we propose to allocate the replicates of some carefully identified informative observations to multiple nodes (local processors). The idea is analogous to the popular oversampling technique. Although such a technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has a smaller asymptotic mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions. Our theoretical findings are supported by both simulated and real-data analyses.


Top 10 Must-Know Machine Learning Algorithms for Data Scientists – Part 1 - KDnuggets

#artificialintelligence

Wading through the vast array of information for data science newcomers on machine learning algorithms can be a difficult and time-consuming process. Figuring out which algorithms are widely used and which are simply novel or interesting is not just an academic exercise; determining where to concentrate your time and focus during the early days of study can determine the difference between getting your career off to a quick start and experiencing an extended ground delay. How, exactly, does one discriminate between immediately useful algorithms worthy of attention and, well, not so much? Determining how to come up with an objective list of machine learning algorithms of authority is inherently difficult, but it seems that going directly to practitioners for their feedback may be the optimal approach. Such a process presents a whole host of difficulties of its own, as could be easily imagined, and results of such surveys are few and far between.


Divide-and-conquer methods for big data analysis

arXiv.org Machine Learning

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each analysis together. This approach is effective in handling large data sets that are unsuitable to be analyzed entirely by a single computer due to limits either from memory storage or computational time. The combined results will provide a statistical inference which is similar to the one from analyzing the entire data set. This article reviews some recently developments of divide-and-conquer methods in a variety of settings, including combining based on parametric, semiparametric and nonparametric models, online sequential updating methods, among others. Theoretical development on the efficiency of the divide-and-conquer methods is discussed. Examples of real-world data analyses are provided in various application areas.


Fast and Accurate Refined Nyström-Based Kernel SVM

AAAI Conferences

In this paper, we focus on improving the performance of the Nyström based kernel SVM. Although the Nyström approximation has been studied extensively and its application to kernel classification has been exhibited in several studies, there still exists a potentially large gap between the performance of classifier learned with the Nyström approximation and that learned with the original kernel. In this work, we make novel contributions to bridge the gap without increasing the training costs too much by proposing a refined Nyström based kernel classifier. We adopt a two-step approach that in the first step we learn a sufficiently good dual solution and in the second step we use the obtained dual solution to construct a new set of bases for the Nyström approximation to re-train a refined classifier. Our approach towards learning a good dual solution is based on a sparse-regularized dual formulation with the Nyström approximation, which can be solved with the same time complexity as solving the standard formulation. We justify our approach by establishing a theoretical guarantee on the error of the learned dual solution in the first step with respect to the optimal dual solution under appropriate conditions. The experimental results demonstrate that (i) the obtained dual solution by our approach in the first step is closer to the optimal solution and yields improved prediction performance; and (ii) the second step using the obtained dual solution to re-train the model further improves the performance.


A Divide-and-Conquer Approach for Solving Interval Algebra Networks

AAAI Conferences

Deciding consistency of constraint networks is a fundamental problem in qualitative spatial and temporal reasoning. In this paper we introduce a divide-and-conquer method that recursively partitions a given problem into smaller sub-problems in deciding consistency. We identify a key theoretical property of a qualitative calculus that ensures the soundness and completeness of this method, and show that it is satisfied by the Interval Algebra (IA) and the Point Algebra (PA). We develop a new encoding scheme for IA networks based on a combination of our divide-and-conquer method with an existing encoding of IA networks into SAT. We empirically show that our new encoding scheme scales to much larger problems and exhibits a consistent and significant improvement in efficiency over state-of-the-art solvers on the most difficult instances.